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ABSTRACT 


\ 

V 

A  statistical  procedure  is  described  for  comparing  two  lifetime 
distributions  when  the  data  is  reviewed  repeatedly  over  time.  The 
procedure  provides  the  capability  of  early  decision  while  maintaining 
both  a  fixed  significance  level  and  a  fixed  maximum  length  for  the 
entire  experiment.  The  effects  which  staggered  entry,  number  of  looks 
at  the  data  and  maximum  test  length  have  on  power  and  expected  test 
length  are  discussed.  An  application  is  made  to  a  bearing  fatigue-life 
test.  v 
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I.  INTRODUCTION 


In  the  field  of  reliability,  it  is  often  necessary  to  conduct 
experiments  in  order  to  compare  two  lifetime  distributions.  Examples 
Include  comparing  lifetimes  of  similar  products  from  different  manu¬ 
facturers  or  comparing  similar  products  which  meet  different  quality 
control  standards.  Due  to  the  length  of  such  "life  tests,"  it  is 
usually  desirable  to  review  the  data  periodically  over  the  course  of 
the  experiment.  If  the  difference  between  the  two  groups  is  large,  it 
may  be  possible  to  reach  an  early  conclusion  and  truncate  the  test  with 
considerable  savings  in  time  and  expense. 

When  data  is  reviewed  periodically,  the  probabilities  associated 
with  classical  significance  tests  are  not  appropriate.  Wald  [3]  devel¬ 
oped  sequential  procedures  wherein  the  data  is  reviewed  periodically 
with  a  fixed  significance  level  for  the  entire  test.  The  Wald  type 
sequential  tests  are  useful  primarily  when  the  response  time  is  short 
relative  to  the  entry  period  of  items  into  the  experiment  [1].  In  the 
area  of  reliability,  response  time  (for  example,  fatigue  life  of  bearings) 
can  be  several  years. 

Canner  [1]  developed  a  statistical  procedure  for  relatively  short 
tests,  which  is  a  compromise  between  the  fixed  length  and  the  sequential 
tests.  The  "Canner  test"  provides  the  capability  of  early  decision  while 
maintaining  both  a  fixed  significance  level  and  a  fixed  maximum  length 
foT  the  entire  experiment.  Items  can  enter  the  experiment  simultaneously 
or  with  a  staggered  entry  following  the  uniform  distribution.  (Actually 
any  entry  distribution  can  be  assumed,  but  only  the  simultaneous  and 


uniform  cases  are  considered  here.) 

Although  the  Canner  test  was  originally  developed  for  the  analysis 
of  clinical  trials,  there  is  no  reason  to  limit  its  use  to  medical 
applications.  In  this  report,  the  Canner  test  is  described  in  a 
reliability  setting.  After  some  of  Canner's  results  are  mentioned, 
additional  properties  of  the  test  are  investigated.  In  particular,  the 
power  and  expected  test  length  are  approximated  for  a  variety  of  conditions 
and  the  effects  which  various  experimental  design  factors  have  on  power 
and  expected  test  length  are  discussed. 


II.  THE  CANNER  TEST 


The  control  group,  group  1,  and  the  experimental  group,  group  2, 
contain  n^  and  ^  items  respectively.  Let  denote  the  times  of  entry 
into  the  experiment,  assumed  to  be  uniformly  distributed  over  the  first 
Tg  years,  let  R^  denote  the  failure  times  (in  years)  and  let  Tg  denote 
the  maximum  length  of  the  test  (in  years).  Here,  as  in  the  remainder  of 
the  report,  i  indicates  group  and  j  indicates  item  (i  ■  1,  2;  j  -  1,  2,..., 
n^).  For  example,  Ty  denotes  the  entry  time  of  the  j—  item  from  the  i~ 
group.  For  ease  of  writing,  this  item  will  be  referred  to  as  the  (i,  j)— 
item. 

The  response  variable  is  item  lifetime,  which  is  assumed  to  follow  a 
Weibull  distribution  with  known  shape  parameter,  V.  Although  Canner  [1] 
considers  only  the  exponential  distribution,  which  is  Weibull  with  V  ■  1, 
the  more  general  case  can  be  considered  by  utilizing  the  fact  that  if  X 
is  distributed  as  a  Weibull  random  variable  with  scale  parameter  y  and 
shape  parameter  V,  then  Xv  is  distributed  as  an  exponential  random  vari¬ 
able  with  mean  yV. 

At  time  t  (years),  0  _<  t  <_  Tg,  the  (i,j)—  item  is  in  one  of  three 
states: 


1.  the  item  failed  at  time  R  (T^  <_  R^j  £  t) 

2.  the  item  remains  in  operation  (T^  £  t  £  R^) 

3.  the  item  has  not  yet  entered  the  test  (t  £  T^  <_  R^). 

For  the  n^t  items  from  group  1  which  are  in  states  1  or  2  at  time  t. 
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define  u^^,  the  survival  time  in  “exponential"  units,  as 


Let  6 


(RAj  -  Tij)v,  if  the  (i,  j)—  item  is  in  state  1  at  time  t. 
^  (t  -  **  the  j)^  item  is  in  state  2  at  time  t. 

1,  if  the  (i,  j  )-~  item  is  in  state  1  at  time  t 
^  0,  if  the  (i,  j)—  item  is  in  state  2  at  time  t. 


it 


it 


and  define  D  -  E  6...  and  U.„  -  E  u...  .  Note  that  at  time  t,  u..,. 

it  ^  ijt  it  ^  ijt  ijt 

denotes  the  length  of  time  (after  transforming  to  exponential  units)  that 
the  (i,  j )— ~  item  has  been  in  operation,  n^t  denotes  the  number  of  items 
from  group  1  which  have  entered  the  experiment,  D^t  denotes  the  number  of 
items  from  group  i  which  have  failed  and  denotes  the  accumulated,  trans¬ 
formed  time  for  all  items  in  group  i. 

The  probability  that  an  item  in  group  i  fails  before  t  months  of  opera¬ 
tion  time  is  P^Ct)  -  1  -  e~^it  ,  where  \±  *  is  the  exponential  parameter 
for  group  i.  The  purpose  of  the  test  is  to  determine  whether  X^  and  X^ 
differ.  Throughout  this  report  the  one-sided  test  H^:  X^  ■  X^  versus 
Hl:  *1  *  *2  considered.  Extensions  to  the  test  with  the  inequality 

reversed  or  to  the  two-sided  test  are  straightforward.  See  Canner  [1]  for 
construction  of  the  two-sided  test. 

The  maximum  likelihood  estimate  of  X^  -  X2  is  given  by 

X.  -  L  -  Dlt  -  P2t 
UU  U2t  * 

A 

An  estimator  of  the  variance  of  A^  -  X^  is  given  by 


A  A 


Var^  -  X2)  -  X2 


mm  n 

- 

n  .  n« 

It  +  2t 

1  +  JL_ 

Dlt  +  D2t> 

_ nlt  n2ta 
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where  \  ■  (D^t  +  V'<Ult  +  U2t). 

It  is  decided  in  advance  that  the  data  will  be  reviewed  K  times  during 
the  course  of  the  experiment,  at  intervals  of  Tp/K  years.  For  the  kc^  look, 
at  the  data  (at  time  t  -  kTp/K),  the  following  test  statistic  is  computed: 

'k  ■  «1  -  Wvlr  <X2  -  X2)  . 

The  decision  rule  is 

1.  If  zfc  <  A,  stop  the  experiment  and  conclude  H^. 

2.  If  zfe  >  A, 

a.  continue  the  experiment  if  k  <  K 

b.  stop  the  experiment  and  conclude  if  k  ■  K. 

A  is  the  critical  value  of  the  test  such  that  P(zk  £  a|Hq)  *  a  for  a  given 
level  of  significance,  a. 

Due  to  the  flexibility  of  the  Canner  test,  analytic  construction  of 

the  exact  test  quickly  becomes  intractable.  Therefore,  a  computer  simulation 

is  used  to  construct  an  approximate  test  for  a  given  set  of  restrictions: 

sample  size  (n^,  n^) ,  maximum  length  of  test  (Tp),  number  of  looks  at  the 

data  (K),  length  of  entry  period  (T  ),  value  of  Heibull  scale  parameter 

£ 

under  the  null  hypothesis  of  no  difference  between  group  1  and  group  2 
(y1  ■  Y2) »  value  of  the  Weibull  shape  parameter  (v) ,  significance  level  (a) . 
Following  is  an  outline  of  the  simulation  procedure. 

1.  Specify  n1#  n2,  Tp,  K,  Tg,  y^-  y2>,  v,  a. 

2.  Simulate  entry  times  T^j  by  generating  independent,  uniformly 
distributed  random  numbers  between  0  and  Tg.  For  the  case  of 
simultaneous  entry,  T£  -  0  and  T  -  0  (i  -  1,  2;  j  -  1,...^). 

3.  Generate  a  new  set  of  independent  uniform  random  numbers  between 
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0  and  1  and  use  these  to  simulate  failure  times  in  accordance  with 
the  failure  distribution  under  the  null  hypothesis  as  follows. 

Suppose  that  for  the  (i,  j)—  item,  the  random  number  x  is  chosen. 

V  V 

Then  x  -  1  -  e”^i  y  ,  which  implies  that  the  operation  time  until 

failure  is  y  *  l-ln(l  -  x)]^V/y^.  Thus  the  simulated  failure  time 

1/v 

is  R±j  -  T±j  +  y  -  Tjj  +  [-ln(l  -  x)]  /y  If  R±j  >  Tp,  that 
item  is  considered  to  have  survived  the  maximum  length  of  the  experi¬ 
ment  without  failure. 

A.  Calculate  z  (k  -  1,...,K)  using  the  values  simulated  in  steps  2  and 
k 

3. 

K 

5.  Define  z  -  min  z,  . 

1  k  » 

6.  Repeat  steps  2  to  5,  m  times.  The  lower  100  a  percentile  of. the  m 
values  of  z  will  define  the  appropriate  critical  value,  A.  For  the 
special  case  of  the  symmetric  distribution  of  zfc  resulting  from  equal 
sample  sizes  (n^  »  n^),  the  critical  value  is  defined  by  the  lower 
100  •  2  •  a  percentile  of  the  m  values  of  -|z|.  For  this  report, 
the  value  of  m  was  5000  for  each  simulation  of  a  critical  value. 
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III.  BEHAVIOR  OF  THE  TEST 


By  simulating  an  experiment  under  various  sets  of  restrictionst  much 
can  be  learned  about  the  behavior  of  the  test.  The  effects  of  changing 
restrictions  on  the  critical  region  are  given  by  Canner  [1].  Once  a  critical 
region  is  constructed  for  a  given  set  of  restrictions,  the  experiment  can 
be  simulated  with  only  the  value  of  changed  (i.e. ,  under  the  alternative 
hypothesis)  to  estimate  the  power  of  the  test  and  the  expected  test  length. 

The  proportion  of  the  m  values  of  z  which  fall  in  the  critical  region  is 
the  estimate  of  power.  In  addition,  the  expected  test  length  is  estimated 
by  averaging  the  m  simulated  times  until  decision.  For  this  report,  the 
value  of  m  was  3000  for  each  simulation  of  power  and  expected  test  length. 

A.  ACCURACY  OF  POWER  ESTIMATES 

Before  powers  can  be  meaningfully  compared  it  is  necessary  to  determine 
how  accurate  the  power  approximations  are.  Canner  [1]  deals  with  part  of 
this  question  indirectly  by  constructing  952  confidence  intervals  for 
critical  values  using  a  nonparametric  method  given  by  Kendall  and  Stuart  [2]. 
By  estimating  the  power  of  a  test  using  the  upper  and  lower  95Z  confidence 
limits  on  the  critical  value  (as  well  as  the  point  estimate  of  the  critical 
value),  a  "narrow"  confidence  interval  can  be  calculated  for  the  power 
estimate.  The  qualification,  "narrow,"  is  used  because  the  additional 
component  of  variability  due  to  the  run  to  run  differences  of  the  power 
simulation  is  absent. 

To  estimate  this  component,  a  power  simulation  can  be  replicated  several 
times  with  only  different  random  number  streams,  and  a  952  confidence  interval 


constructed  from  the  power  estimates.  The  sum  of  the  width  of  this  interval 
and  the  width  of  the  "narrow"  interval  mentioned  earlier  gives  the  approxi¬ 
mate  width  of  the  true  95%  confidence  interval.  Ideally,  this  procedure 
should  be  followed  for  each  set  of  restrictions  considered.  However,  since 
only  a  rough  idea  of  the  magnitude  of  error  is  usually  needed,  just  a  few 
representative  cases  were  used.  From  these,  it  was  concluded  that  the  power 
estimates  are  accurate  to  about  .027,  95%  of  the  time.  It  should  be  stressed 
that  this  is  nothing  more  than  a  rough  rule  of  thumb. 

B.  CANNER’S  RESULTS 

Some  of  Canner’s  results  [1]  deal  with  the  effects  that  varying  sample 
size,  scale  parameter  and  number  of  looks  at  the  data  have  on  the  critical 
region.  Cases  with  bv>th  simultaneous  and  staggered  entry  are  considered. 

He  also  discusses  the  effect  that  increasing  the  number  of  looks  has  on  the 
power  of  the  test,  noting  a  moderate  loss  in  power  as  the  number  of  looks 
increases . 

In  an  Investigation  of  robustness,  Canner  shows  the  test  to  be  quite 
robust  against  changes  in  entry  distribution  and  length  of  test.  As  it  is 
not  the  purpose  of  this  report  to  give  a  complete  review  of  Canner’s  work, 
the  reader  is  referred  to  Canner 's  paper  [1]  for  more  results  and  details. 

C.  EXAMPLE  OF  AN  EXPERIMENT 

For  the  remainder  of  this  section,  an  example  experiment  will  be  utilized. 
This  example  is  based  on  a  bearing  fatigue-life  test  being  considered  by  the 
Naval  Ship  Research  and  Development  Center.  The  control  group  consists  of 
bearings  which  are  expected  to  have  B^q  lives  of  at  least  10,000  hours.  A 
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bearing  is  said  Co  have  a  B  life  of  h.  hours  if  Che  bearing  fails  wichin 
h  hours  of  operacion  time  wich  probabiliCy  .10.  The  purpose  of  Che  sCudy 
is  Co  CesC  whether  bearings  of  the  type  in  Che  experimental  group  have 
lives  significantly  less  Chan  10,000  hours.  The  sample  size  in  each  group 
is  50. 

Bearings  in  both  groups  are  assumed  to  have  Weibull  fatigue-life  distri¬ 
butions  with  shape  parameter  1.5  and  scale  parameters  which  are  functions  of 
the  B^  lives.  It  is  convenient  to  express  the  hypotheses  in  terms  of 
lives  rather  than  scale  parameters.  Letting  B^  and  Bg  denote  the  B^q  lives, 
in  thousands  of  hours,  of  the  control  group  and  the  treatment  group  respec¬ 
tively,  the  hypotheses  can  be  written 

V  BA  =  10*  BB  “  10 

1^:  Ba  -  10,  Bb  <  10. 

For  the  power  calculations  which  follow  (except  where  noted  otherwise)  the 
value  of  Bg  under  the  alternative  hypothesis  is  assumed  to  be  5.0. 


D.  RESULTS  CONCERNING  POWER  OF  THE  TEST 


An  alternative  to  Canner's  test,  which  can  be  used  when  all  items  enter 
the  test  simultaneously  and  the  data  is  reviewed  only  once  at  the  end  of  the 
experiment,  is  to  invoke  the  Central  Limit  Theorem  and  compare  the  average 
censored  lifetimes  of  the  two  groups.  The  test  used  is  essentially  the 
standard  two-sample  t-test.  Letting 
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n  n. 

1  2  1  _  9 

x.  -  I  x  /n  ,  and  s  ■  Z  (x  -  x  )  /(n  -  1),  the  test  statistic  is 
1  j-i  ij  i  i  j-l  1  1 

2  2 

W  -  (xx  -  x^/Ksj/nj)  +  (s2/n2)] 

Under  Hq  and  for  large  and  n2,  W  has  essentially  a  standard  normal 
distribution.  Note  that  for  this  simple  case  (Tp  ■  0,  K  ■  1)  the  major  dif¬ 
ference  between  the  t-test  and  the  Canner  test  is  the  test  statistic.  A 
1.25  year  experiment  was  simulated  to  compare  the  powers  of  the  t-test  and 
the  Canner  test.  The  results,  on  lines  1  and  2  of  Figure  1,  show  that  the 
power  of  the  Canner  test  is  better  than  that  of  the  t-test. 

A  second,  more  realistic  experiment  was  simulated  to  illustrate  the 
effects  that  staggered  entry  and  repeated  looks  at  the  data  have  on  power  and 
and  expected  test  length.  In  this  experiment,  an  entry  period  of  2.5  years 
(T  -  2.5)  and  a  maximum  length  of  2.5  years  (T_  “  2.5)  were  assumed.  The 

I S  r 

powers  and  expected  test  lengths  were  estimated  for  both  K  ■  1  and  for  K  -  10. 
The  results,  on  lines  3  and  4  of  Figure  1,  show  that  when  repeated  looks  are 
made  at  the  data,  a  moderate  loss  in  power  is  offset  by  a  significant  decrease 
in  expected  test  length.  (For  the  case  K  ■  1,  expected  test  length  is,  of 
course,  exactly  the  full  length  of  the  test,  T_.) 

f 

Another  interesting  comparison  can  be  made  from  lines  2  and  3.  Mote  that 
in  the  experiment  represented  by  line  3,  items  enter  over  the  whole  course  of 
the  test  so  that  the  average  follow-up  time  is  .5  Tp  or  1.25  years.  Thus 
lines  2  and  3  represent  experiments  with  equal  average  follow-up  times.  When 
average  follow-up  time  is  held  fixed  and  one  look  at  the  data  is  made,  staggered 
entry  appears  to  increase  the  power  slightly.  It  might  be  argued  that  due  to 
the  error  bounds  of  +  .027  around  each  power  estimate,  there  is  no  significant 
difference  between  the  powers  of  the  staggered  and  simultaneous  entry  cases. 


However,  while  the  difference  for  this  particular  case  is  quite  small,  it 

does  exist;  an  intuitive  argument  will  be  given  in  section  III.E. 

In  the  experiment  represented  by  line  4  of  Figure  1,  the  items  have  a 

staggered  entry  and  the  data  is  reviewed  quarterly  (every  three  months)  with 

an  average  follow-up  time  of  1.25  years.  Consider  the  experiment  represented 

by  line  5,  also  with  quarterly  review  and  average  follow-up  time  of  1.25 

years,  but  with  simultaneous  entry.  Referring  to  lines  4  and  5,  it  is  seen 

that  there  is  a  slight  increase  in  power  due  to  the  staggered  entry  when  the 

period  of  data  review  an£  average  follow-up  time  are  held  fixed. 

Note  that  in  the  last  two  examples  the  effect  of  staggered  entry  has 

been  to  Increase  the  power.  However,  for  cases  where  the  test  length  is 

relatively  long  compared  to  item  lifetimes  (cases  not  likely  to  be  encountered 

in  practice)  staggered  entry  can  have  a  detrimental  effect  on  power;  an 

intuitive  explanation  for  this  will  be  offered  in  section  III.E. 

Consider  two  such  "long"  experiments,  each  with  an  average  follow-up 

time  of  6  years  and  with  yearly  review  of  the  data.  In  one  experiment,  Tg  *  0 

and  T„  -  6  while  in  the  other  experiment,  T  ■  12  and  T  *12.  For  these 
F  E  F 

experiments,  B  «  6.7  is  used  in  the  alternative  hypothesis  Instead  of  B  ■  5.0 
B  o 

so  that  the  resulting  powers  have  easily  comparable  values.  Here,  the  effect 
of  staggered  entry  (as  is  shown  on  lines  6  and  7)  is  to  decrease  the  power. 

E.  EFFECTS  OF  STAGGERED  ENTRY 

The  results  given  above  are  not  surprising,  except  perhaps  for  the  dif¬ 
ference  in  the  effect  of  staggered  entry  for  short  and  long  tests.  When  the 
underlying  distribution  is  in  fact  exponential  (i.e.,  Weibull  with  v  -  1.0) 
or  Weibull  with  0  <  v  <  1.  this  difference  does  not  occur;  the  effect  of 


staggered  entry  then  is  always  to  decrease  the  power.  For  Weibull  distri¬ 
butions  with  v  >  1,  the  detrimental  effect  of  staggered  entry  on  power  occurs 
only  for  "long"  tests.  In  this  case,  the  detrimental  effect  occurs  only  for 
longer  and  longer  tests  as  v  increases.  Recall  that  in  the  examples  used 
above  v  ■  l.S.  The  following  paragraphs  provide  an  intuitive  explanation 
of  this  phenomenon. 

In  general,  an  uncensored  observation  (knowing  the  actual  lifetime  of 
an  item)  gives  more  information  and  yields  better  estimates  and  more  powerful 
tests  than  a  censored  observation  (knowing  only  that  the  lifetime  of  the  item 
exceeds  some  fixed  time).  Thus,  as  the  probability  of  failure  before  termin¬ 
ation  of  a  test  increases,  so  does  the  power  of  the  test. 

Let  F(t)  represent  the  cumulative  distribution  function  of  an  underlying 
fatigue-failure  probability  distribution.  F(t)  is  the  probability  that  an 
item  fails  before  t  years  of  operation  time.  Now  consider  an  extreme  type 
of  staggered  entry  wherein  half  of  the  items  enter  the  test  at  time  0  and  the 
other  half  enter  at  time  t'.  If  the  test  is  terminated  at  time  t"  (t"  >  t'), 
then  half  of  the  items  will  have  follow-up  time  t^  ■  t"  -  t'  and  half  of  the 
items  will  have  follow-up  time  t^  •  t".  The  average  follow-up  time  is 
(t^  +  tp/2  and  the  average  probability  of  failure  before  termination  of  the 
test  la  Fstag  ■  [F(t^)  +  F(t2)]/2.  itor  the  corresponding  simultaneous  entry 
test  with  the  same  follow-up  time,  the  average  probability  of  failure  before 
termination  of  the  test  is  Fgljn  ■  F[(t^  +  t2>/2].  Thus,  staggered  entry  will 
Increase  power  when  Fgtgg  >  Fgjja  and  will  decrease  power  when  Fgtgg  <  Fgjlg. 

Figures  2  and  3  illustrate  how  the  effect  of  staggered  entry  can  vary, 
depending  on  the  length  of  the  test  and  the  underlying  distribution.  In 
Figure  2,  the  cumulative  distribution  function  of  an  exponential  distribution 
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ion  Function  of  Exponential  Distribution 


is  shown.  It  can  be  seen  that  due  to  the  concave  shape  of  the  function, 
Fstag  £  Fsim  for  anF  values  of  ^  an<^  t2’  rhe  sane  Pheno®enon  occurs  for 
any  Weibull  distribution  with  0  <  v  1.0.  In  Figure  3,  the  cumilatlve 
distribution  function  of  the  Weibull  distribution  with  shape  parameter 
v  ■  2.S  is  shown.  (Any  v  >  1  could  have  been  used.)  Note  that  for  the 
larger  values  of  ^  and  t2,  Fstag  <  Fgim  while  for  the  smaller  values  of 
t^  and  the  direction  of  the  inequality  is  reversed.  Thus,  staggered 
entry  can  have  a  positive  effect  on  power  for  short  tests  with  a  fixed 
average  follow-up  time. 

When  average  follow-up  time  is  fixed,  it  is  Important  to  note  that, 
from  a  practical  standpoint,  the  effect  of  staggered  entry  on  power  is 
largely  a  moot  point.  If  an  experimenter  does  have  the  opportunity  to 
choose  between  different  entry  distributions  and  test  lengths,  it  is  always 
best  to  extend  the  average  follow-up  time  as  long  as  possible  by  using 
simultaneous  entry  and  a  long  maximum  test  length.  However,  even  if  entry 
distribution  and  maximum  test  length  are  fixed  (thus  fixing  an  average 
follow-up  time)  it  may  be  useful  to  estimate  how  much  power  is  "lost"  due 
to  a  staggered  entry. 
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IV.  DISCUSSION  AMD  SUMMARY 


While  It  is  useful  to  exhibit  the  effects  that  entry  time,  test 

length,  number  of  looks  and  other  design  factors  have  on  power  with  actual 
(simulated)  examples,  it  must  be  remembered  that  it  is  often  difficult  to 
use  these  few  examples  to  formulate  hard  and  fast  rules.  A  complete  analysis 
of  the  Interacting  effects  of  various  factors  is  not  within  the  scope  of 
this  report. 

However,  the  following  statements  about  the  Canner  test  can  be  made  with 
high  confidence.  Some  of  the  statements  are  drawn  from  well  known  results 
in  the  area  of  hypothesis  testing  while  others  are  drawn  from  the  results 
given  in  Canner* s  paper  [1]  and  in  section  III  of  this  report. 

1.  The  Canner  test  is  more  powerful  than  the  t-test. 

2.  Power  increases  as 

(a)  maximum  length  of  test  increases. 

(b)  true  B^q  life  of  the  experimental  group  decreases. 

(c)  number  of  looks  at  the  data  decreases. 

3.  Expected  test  length  decreases  as 

(a)  number  of  looks  at  the  data  increases. 

(b)  true  B1q  life  of  the  experimental  group  decreases. 

4.  When  average  follow-up  time  is  fixed  and  the  underlying  distribution 

is  Welbull  with  0  <  V  1.0,  staggered  entry  decreases  the  power. 

5.  When  average  follow-up  time  is  fixed  and  the  underlying  distribution 

is  Welbull  with  v  >  1, 

(a)  staggered  entry  increases  the  power  for  "short"  tests. 
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(b)  staggered  entry  decreases  the  power  for  "long"  tests 


In  real-life  applications,  even  in  the  simplest  of  cases,  it  is  likely 
that  the  exper inenter  would  want  to  run  several  slnulations  to  approximate 
the  power  of  various  tests  as  an  aid  in  designing  the  experiment.  At  such 
a  time,  the  nature  of  the  interactions  between  maximum  test  length,  entry 
time,  number  of  looks  and  other  design  factors  can  be  determined  over  a 
range  of  designs  which  are  within  economic  and  time  constraints.  In 
addition,  the  relative  importance  of  power  and  expected  test  length  can  be 
considered  as  part  of  the  design  process. 
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